[SCHEMA] Add TSV column files #827

tsalo · 2021-07-06T16:58:56Z

Closes None.

Tagging @kayrobbins so she can take a look.

Changes proposed:

Add a YAML file describing columns for TSV files in the specification.
Add a macro and Python function to render column tables.
Replace hardcoded tables in the specification with macro-generated ones.
- Currently, the generated tables are provided in addition to the hardcoded tables, for easy comparison during review.

To do:

Add columns from [ENH] BEP031 - New columns to participants.tsv file #816 once it's merged.
Remove hardcoded tables, after the necessary approvals.

CPernet · 2021-07-09T06:03:59Z

can I ask why using YAML? for many people BIDS is scary just because we have tsv and json ... this adds yet another layer -- I think we need to be careful

tsalo · 2021-07-09T06:24:20Z

The main reason we chose yaml over json is that yaml allows comments. Also, the schema files describing the specification aren't things that BIDS users will interact with, any more than they spend time with the pybids or bids-validator code.

CPernet · 2021-07-09T06:33:34Z

ah I see - ok cool thx, just wanted to know more

Remi-Gau · 2021-07-09T06:54:57Z

Made a mental note yesterday though that I will need to double check how this splitting of content in several files and format might affect "on-boarding" of people who want to contribute to specifications (especially for BEP content but also in general). Will try to remember to raise the issue at the next maintainers meeting I attend.

…

Sent from my Galaxy

Note that there are three very different uses of "name" columns, and two of them are equally common, so I chose not to specify any of them as the "canonical" definition.

erdalkaraca · 2021-08-17T19:34:19Z

Made a mental note yesterday though that I will need to double check how this splitting of content in several files and format might affect "on-boarding" of people who want to contribute to specifications (especially for BEP content but also in general). Will try to remember to raise the issue at the next maintainers meeting I attend.
…
Sent from my Galaxy

Normally (speaking from experience), a technical schema is somehow self-contained. Indeed, the many yaml files may scare off contributors, but, as Taylor mentioned, they are rather intended for automatic processing via tooling

Tokazama · 2021-08-26T18:00:23Z

If the intent is to provide a machine readable method, multiple files and directories on github is more difficult. It's pretty simple to download raw files, but the easiest way to manage the many schema files is to clone the whole repo. It's not an unmanageable obstacle, just a bit obnoxious.

tsalo · 2021-09-22T20:48:10Z

I will consolidate the column files into a single file if #883 is merged. I'm planning to hold off on changing the structure in this PR until that PR is handled.

effigies · 2021-09-22T21:00:42Z

@tsalo Apologies for taking so long. Which PR should be considered higher priority?

tsalo · 2021-09-22T21:05:03Z

This PR contains content that needs to be reviewed, while the other really just needs a quick once-over from anyone who uses the schema. The changes in #883 include updates to the internal schema-processing code, so nothing in the specification should really change there.

In summary, I think that #883 is higher priority, but this PR will need a more thorough review.

tsalo · 2021-09-24T18:46:20Z

@effigies just a small note to follow up from our call earlier this week. It looks like the JSON fields used to describe TSV columns (e.g., "LongName", "Description", "Levels") are included in the metadata YAML files, so they won't need to be added here.

src/schema/columns/type.yaml

Remi-Gau

Not sure if you are planning to add more to this but so far this looks good to me.

effigies

Sorry, I keep starting and having to drop this. Here are some comments in my queue before I start a new review.

effigies · 2021-10-06T14:48:11Z

src/schema/objects/columns.yaml

+  - cell line
+  - in vitro differentiated cells
+  - primary cell
+  - cell-free sample
+  - cloning host
+  - tissue
+  - whole organisms
+  - organoid
+  - technical sample


Can we automatically turn this into the following?

One of "cell line", "in vitro differentiated cells", [...]

I think automatically incorporating value restrictions like enums would be awesome. I'd prefer to tackle it in a separate PR though.

I've opened #912 about this.

src/schema/objects/columns.yaml

effigies · 2021-10-06T14:58:05Z

src/schema/objects/columns.yaml

+    Hexadecimal. Label color for visualization.
+  type: string
+  unit: hexadecimal
+derived_from:


While the column header "derived_from" seems fine, perhaps we should have a more descriptive term name:

Suggested change

derived_from:

sample_derived_from:

My current approach to multiply-defined objects is to start with the base name and only create a new definition if a new use for that object is proposed with a definition that is incompatible with the original. I'd prefer not to link objects to specific datatypes or sections until it becomes necessary, but others may feel differently.

When we do have multiple definitions of the same object, I want to make sure that the names appear together and that it's clear where the difference from the "name" comes from, so I would lean toward something like derived_from__sample. I added a section to the schema README about duplicate terms with different definitions, so hopefully that will be more clear now.

I think this is the last remaining blocker, and what we decide here should apply across the schema. Here is what I added to the schema README. @effigies @Remi-Gau @sappelhoff WDYT?

If an object may mean something different depending on where it is used within the specification,
then this must be reflected in the schema.
Specifically, each version of the object must have its own definition within the relevant file.
However, since object files are organized as dictionaries, each object must have a unique key.
Thus, we append a suffix to each re-used object's key in order to make it unique.
For objects with CamelCase names (for example, metadata fields), the suffix will start with a single underscore (_).
For objects with snake_case names, two underscores must be used.

There should also be a comment near the object definition in the YAML file describing the nature of the different objects.

For example, the TSV column "reference" means different things when used for EEG data, as compared to iEEG data.
As such, there are two definitions in columns.yaml for the "reference" column: "reference__eeg" and "reference_ieeg".

# reference column for channels.tsv files for EEG data reference__eeg: name: reference description: | Name of the reference electrode(s). This column is not needed when it is common to all channels. In that case the reference electrode(s) can be specified in `*_eeg.json` as `EEGReference`). type: string # reference column for channels.tsv files for iEEG data reference__ieeg: name: reference description: | Specification of the reference (for example, 'mastoid', 'ElectrodeName01', 'intracranial', 'CAR', 'other', 'n/a'). If the channel is not an electrode channel (for example, a microphone channel) use `n/a`. anyOf: - type: string - type: string enum: - n/a

When adding new object definitions to the schema,
every effort should be made to find a shared, common definition for the term, should it already exist.
If the differences between two versions of the same object are subtle or driven by context,
then you can generally append additional text to the object definition within the associated rendered table in the specification,
rather than creating a separate entry in the schema.

effigies

Here's a proposal for including common text in both samples and sessions.

src/03-modality-agnostic-files.md

src/schema/objects/columns.yaml

effigies · 2021-10-26T16:26:03Z

src/schema/objects/columns.yaml

+    of the strain of the species, for example: `RRID:IMSR_JAX:000664`.
+  type: string
+  format: rrid
+time:


Seems like a good idea to give this a specific name like blood_sample_time.

effigies · 2021-10-26T16:29:28Z

Since I think @sappelhoff covered *EG, I read all rendered pages except that.

I have not gone through the actual schema file in detail or the Python (at least recently). Please let me know if you'd like me to.

Co-authored-by: Chris Markiewicz <[email protected]>

…ecification into tsv-columns-schema

Tokazama · 2021-10-26T18:47:45Z

src/03-modality-agnostic-files.md

+{{ MACROS___make_columns_table(
+   {
+      "participant_id": ("REQUIRED", "There MUST be exactly one row for each participant."),
+      "species": "RECOMMENDED",
+      "age": "RECOMMENDED",
+      "sex": "RECOMMENDED",
+      "handedness": "RECOMMENDED",
+      "strain": "RECOMMENDED",
+      "strain_rrid": "RECOMMENDED",
+   }
+) }}


Is it accurate to say that this macro represents a BIDS rule for the `participants.tsv" file? If so, should this be described in the schema somewhere?

Yes, the tables do represent rules, but those rules are too complicated to be translated directly from the tables. For more information on the types of mechanisms we need to support in the schema to accurately codify the specification's rules, please see #620.

sappelhoff

I am happy with this, thanks Taylor.

tsalo added 4 commits April 9, 2021 22:10

Add template.

84bfd06

Add first column.

066e2a2

Add columns.

10816d7

More terms.

19901cd

tsalo added 2 commits July 9, 2021 11:48

Fill name field in all files.

2a48ef6

More work.

2451b9a

Note that there are three very different uses of "name" columns, and two of them are equally common, so I chose not to specify any of them as the "canonical" definition.

tsalo added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Jul 22, 2021

tsalo added 4 commits August 31, 2021 13:55

Merge branch 'master' into tsv-columns-schema

de00cea

Add remaining definitions.

54458cd

Add macro to render column tables.

b5754e7

Fix YAML file.

42d6e08

tsalo marked this pull request as ready for review August 31, 2021 18:51

tsalo requested review from effigies and sappelhoff as code owners August 31, 2021 18:51

tsalo mentioned this pull request Sep 13, 2021

Consolidate schema files into a single file for each term type #877

Closed

Consolidate suffixes file.

034c6a8

Remi-Gau reviewed Oct 5, 2021

View reviewed changes

src/schema/columns/type.yaml Outdated Show resolved Hide resolved

Remi-Gau approved these changes Oct 5, 2021

View reviewed changes

tsalo added 2 commits October 5, 2021 14:04

Merge branch 'master' into tsv-columns-schema

b0b2c3f

Remove old individual files.

4988676

effigies reviewed Oct 26, 2021

View reviewed changes

src/03-modality-agnostic-files.md Outdated Show resolved Hide resolved

src/03-modality-agnostic-files.md Show resolved Hide resolved

src/schema/objects/columns.yaml Outdated Show resolved Hide resolved

effigies reviewed Oct 26, 2021

View reviewed changes

src/schema/objects/columns.yaml Outdated Show resolved Hide resolved

effigies reviewed Oct 26, 2021

View reviewed changes

src/schema/objects/columns.yaml Outdated Show resolved Hide resolved

effigies reviewed Oct 26, 2021

View reviewed changes

tsalo and others added 8 commits October 26, 2021 14:01

Apply suggestions from code review

fd54609

Co-authored-by: Chris Markiewicz <[email protected]>

Use two underscores to delineate multiply-defined columns.

efa6fa1

Remove text that is now in table.

0a4880f

Update src/schema/objects/columns.yaml

48656fc

Co-authored-by: Chris Markiewicz <[email protected]>

Merge branch 'tsv-columns-schema' of https://github.com/tsalo/bids-sp…

7ff360a

…ecification into tsv-columns-schema

Add sections to README on columns file and on reused terms.

de5b5f6

Merge branch 'master' into tsv-columns-schema

ba84f44

Add EDF info to acq_time definition.

4ba9c25

Tokazama reviewed Oct 26, 2021

View reviewed changes

tsalo mentioned this pull request Oct 27, 2021

Render valid value restrictions in tables based on object definitions in schema #912

Closed

tsalo requested review from sappelhoff and effigies November 2, 2021 16:41

sappelhoff approved these changes Nov 2, 2021

View reviewed changes

tsalo mentioned this pull request Nov 2, 2021

[ENH] Generate glossary page from schema #923

Merged

effigies approved these changes Nov 9, 2021

View reviewed changes

tsalo added 2 commits November 9, 2021 12:06

Remove hardcoded tables.

b510b7d

Remove unused links.

d562986

sappelhoff added this to the 1.7.0 milestone Nov 9, 2021

tsalo merged commit 5400d6f into bids-standard:master Nov 9, 2021

tsalo deleted the tsv-columns-schema branch November 9, 2021 19:02

tsalo mentioned this pull request Nov 10, 2021

Pull BIDS object definitions directly from the BIDS schema NIDM-Terms/terms#149

Open

tsalo added the schema-code Updates or changes to the code used to parse, filter, and render the schema. label Apr 11, 2022

effigies mentioned this pull request Mar 15, 2023

Convert any sample columns in events files to integers. bids-standard/bids-examples#356

Closed

effigies mentioned this pull request May 24, 2024

feat(schema): Provide default JSON column definition for "conventional" columns #1838

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCHEMA] Add TSV column files #827

[SCHEMA] Add TSV column files #827

tsalo commented Jul 6, 2021 •

edited

Loading

CPernet commented Jul 9, 2021

tsalo commented Jul 9, 2021

CPernet commented Jul 9, 2021

Remi-Gau commented Jul 9, 2021 via email

erdalkaraca commented Aug 17, 2021

Tokazama commented Aug 26, 2021

tsalo commented Sep 22, 2021

effigies commented Sep 22, 2021

tsalo commented Sep 22, 2021

tsalo commented Sep 24, 2021

Remi-Gau left a comment

effigies left a comment

effigies Oct 6, 2021

tsalo Oct 26, 2021

tsalo Oct 27, 2021

effigies Oct 6, 2021

tsalo Oct 26, 2021

tsalo Oct 27, 2021

effigies left a comment

effigies Oct 26, 2021

effigies commented Oct 26, 2021

Tokazama Oct 26, 2021

tsalo Oct 26, 2021

sappelhoff left a comment

[SCHEMA] Add TSV column files #827

[SCHEMA] Add TSV column files #827

Conversation

tsalo commented Jul 6, 2021 • edited Loading

CPernet commented Jul 9, 2021

tsalo commented Jul 9, 2021

CPernet commented Jul 9, 2021

Remi-Gau commented Jul 9, 2021 via email

erdalkaraca commented Aug 17, 2021

Tokazama commented Aug 26, 2021

tsalo commented Sep 22, 2021

effigies commented Sep 22, 2021

tsalo commented Sep 22, 2021

tsalo commented Sep 24, 2021

Remi-Gau left a comment

Choose a reason for hiding this comment

effigies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

effigies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

effigies commented Oct 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sappelhoff left a comment

Choose a reason for hiding this comment

tsalo commented Jul 6, 2021 •

edited

Loading